Effectiveness of Automated Chinese Sentence Scoring with Latent Semantic Analysis
نویسندگان
چکیده
Automated scoring by means of Latent Semantic Analysis (LSA) has been introduced lately to improve the traditional human scoring system. The purposes of the present study were to develop a LSA-based assessment system to evaluate children’s Chinese sentence construction skills and to examine the effectiveness of LSA-based automated scoring function by comparing it with traditional human scoring. Twenty-seven fourth graders and thirty-one six graders were assessed on single-character sentence making test (subtest 1) and two-character words sentence making test (subtest 2). The outcomes of LSA-based automated scoring methods in three Chinese semantic spaces generated from three type weighting functions were compared to the traditional human scoring. The results showed that LSA-based automated scoring in three different Chinese semantic spaces and traditional human scoring were highly correlated in single-character sentence making test and moderately correlated in two-character words sentence making test. The Chinese semantic space generated from Log-IDF outperformed the other two types of weighting function in the present study. INTRODUCTION Writing skills are important for children’s overall attainment. It is probably one of the few skills we learned in school that will be used often later in life. Writing is an essential element of children’s education which has an impact on the progress of children achievement across the whole curriculum. Writing is also a means of communication; it allows children to participate actively in learning by sharing ideas, experience, thoughts, and feelings (Huang, Liu, & Hsiao, 2008). Effective writing, which requires writing with clarity, coherence, organization, and accurate grammar, is difficult to achieve, since it involves complex physical and mental processes. One important aspect that is fundamental in learning to write is constructing complete and grammatically correct sentences (Chik, Ho, Yeung, Wong, Chan, Chung, & Lo, 2010; Chik, Ho, Yenng, Chan, & Luan, 2011; Saddler, 2005). Sentence construction can be as difficult a skill to assess as it is to learn. Reliable assessment requires a set of well-developed criteria and a significant amount of time devoted to the scoring procedure. In the present study, an automated scoring system with Latent Semantic Analysis (LSA) was developed to assess children’s Chinese sentence construction skills. The system was designed as a pedagogical tool to provide instant computer-generated scores for sentence construction and to reduce the heavy load in the scoring process. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer & Dumais, 1997; Landauer, McNamara, Dennis, & Kintsch, 2007). It is closely related to neural net models, but is based on singular value decomposition (SVD) and LSA used singular value decomposition to condense a large corpus of texts to 100-500 dimensions (Landauer, Foltz, and Laham, 1998; Landauer et al., 2007). The applications of LSA in educational settings were found in few studies. For example, Millis, Magliano, Wiemer-Hastings, Todaro, and McNamara (2007) assessed reading comprehension skills with LSA and found that LSA predicted reading comprehension skills and identified readers overall reading strategies. LSA was also involved in developing computer tutors, which provide instant feedbacks and teach conceptual knowledge to learners in Newtonian physics (VanLehn, Graesser, Jackson, Jordan, Olney, & Rosé, 2007) and computer literacy (Graesser, Lu, Jackson, Mitchell, Ventura, Olney, & Louwerse, 2004). Moreover, Graesser and his colleagues (Graesser, TOJET: The Turkish Online Journal of Educational Technology – April 2012, volume 11 Issue 2 Copyright The Turkish Online Journal of Educational Technology 81 McNamara, Louwerse, & Cai, 2004; Graesser & McNamara, 2011; Graesser, McNamara, Kulikowich, 2011) developed a Coh-Metrix system with LSA to select appropriate texts for different levels of readers by providing multilevel analyses of text characteristics. Past studies have shown that LSA has an enormous practical value in education; however, so far, LSA is not yet in the replacement of traditional human scorning. Therefore, the present study aimed at developing an automated scorning system of Chinese sentence construction skills with LSA by comparing the effects of three semantic spaces that were established by different types of weighting function (Log-Entropy, Log-IDF, TF-IDF). Few studies discussed the utility of applying different types of weighting function in LSA and found that Log-Entropy gave better results than the other proposed methods (Dumais, 1991; Lintean, Moldovan, Rus, & McNamara, 2010; Nakov, Popova, & Mateev 2001). Thus, generally in application, Log-Entropy was used to develop the semantic space of LSA (Chen, Wang, & Ko, 2009; Quesada, 2006). Nevertheless, empirical evidence supporting the application of various types of weighting function in LSA is still scarce. In this study, three semantic spaces were developed by adopting three types of weighting function and the performance was examined. Finally, the effectiveness of LSA-based automated scoring system was examined by comparing the correlations between human scoring and LSA-based automated scoring. Latent Semantic Analysis To make use of LSA, establishing a semantic space to represent the type-by-document matrix in a given corpus in which each row stands for unique type and each column stands for a document is required. Each element of the type-by-document matrix contains the frequency with which the type of its row appeared in the passage denoted by its column. The type-by-document matrix is often transformed to weight them by their estimated importance in order to better mimic human comprehension process (Landauer et al., 1998; Martin & Berry, 2007; He, Hui, & Quan, 2009; Olmos, León, Escudero, & Jorge-Botana, 2011). Next, SVD (singular value decomposition) and dimension reduction to the type-by-document matrix is applied. SVD is the method used by LSA to decompose the type-by-document input matrix A. The SVD for m × n type-by document input matrix A with the rank of A=r is defined as follows: T A = U V Equation 1 Where U is an orthogonal matrix, V is an orthogonal matrix, and Σ is a diagonal matrix with the remaining matrix cells all zeros (Berry & Browne, 2005; Golub & van Loan, 1989). Dimension reduction is used to remove the extraneous information and variability in type and document vectors which referred to as “noise”. A pictorial representation of the SVD of input matrix A and the best rank-k approximation to A is shown in Figure 1 (Berry, Dumais, & O’Brien, 1995; Martin & Berry, 2007; Witter & Berry, 1998). Figure 1. Diagram of the truncated SVD After SVD and dimension reduction, Ak is the k-dimensional vector space which is called “semantic space”. Objectives of the study 1. To develop a LSA-based assessment system to evaluate children’s Chinese sentence construction skills. To develop a LSA-based assessment system to assess sentence construction skills, single-character sentence construction test (subtest 1) and two-character words sentence construction test (subtest 2) were constructed by two instructors of language and literacy education department. 2. To examine the effectiveness of LSA-based automated scoring function by comparing it with traditional human scoring. To develop the automated scoring system, LSA was employed and the effectiveness of the automated scorning system was examined by the results obtained by human TOJET: The Turkish Online Journal of Educational Technology – April 2012, volume 11 Issue 2 Copyright The Turkish Online Journal of Educational Technology 82 raters and the system. In addition, the effects of three semantic spaces that were established by different types of weighting function (Log-Entropy, Log-IDF, TF-IDF) were also examined. Research Questions 1. Does LSA-based automated scoring system score children’s performance on sentence construction tests as well as human raters? 2. Does the Chinese semantic space generated from Log-Entropy outperform the Chinese semantic spaces generated from Log-IDF and TF-IDF? METHOD Participants There was a total of 58 participants (27 fourth graders and 31 six graders) at Sin-Yi elementary school in Taichung, Taiwan. The mean age of the participants was 10.8 years (range 9.3 to 12.2, SD =1.03). None of the children was previously diagnosed with any emotional, behavioural or sensory difficulties. Sentence Construction Tests Sentence construction skills were assessed by two subtests: single-character sentence construction test (subtest 1) and two-character words sentence construction test (subtest 2). The subtests took approximately 40 minutes to finish. All the tests were computerized. Single-character sentence construction test (subtest 1) There were two practice trials and 10 test trials. In each trial, Chinese single characters were distributed in a raw in random order. Participants were asked to rearrange all the given characters to construct a complete and grammatically correct sentence (an item example is shown in Table 1). The number of characters in each test item ranged from 8 to 16 characters. The interface and instruction of single-character sentence construction is illustrated in Figure 2. Table 1. An example of single-character sentence construction test Item 裡、在、遊、院、玩、戲、我、子 Answer 我在院子裡玩遊戲 I play games in the yard Figure 2. Interface of the single-character sentence construction test Two-character words sentence construction test (subtest 2) There were two practice trials and 10 test trials. In each trial, Chinese two-character words were distributed in a raw in random orders. Participants were asked to rearrange all the words provided to construct a complete and grammatically correct sentence (an item example is shown in Table 2). The number of words in each test item ranged from 5 to 8 words. The interface and instruction of two-character words sentence construction test is illustrated in Figure 3. There are single characters distributed in random orders. Please rearrange the characters to construct a complete and grammatically correct sentence. All characters need to be used and can only be used once. Press the button to start. Answer Box: Participants enter the responses. TOJET: The Turkish Online Journal of Educational Technology – April 2012, volume 11 Issue 2 Copyright The Turkish Online Journal of Educational Technology 83 Table 2. An example of two-character words sentence construction test
منابع مشابه
SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis
Writing in language tests is regarded as an important indicator for assessing language skills of test takers. As Chinese language tests become popular, scoring a large number of essays becomes a heavy and expensive task for the organizers of these tests. In the past several years, some efforts have been made to develop automated simplified Chinese essay scoring systems, reducing both costs and ...
متن کاملParameters Driving Effec- Tiveness of Automated Essay Scoring with Lsa
Automated essay scoring with latent semantic analysis (LSA) has recently been subject to increasing interest. Although previous authors have achieved grade ranges similar to those awarded by humans, it is still not clear which and how parameters improve or decrease the effectiveness of LSA. This paper presents an analysis of the effects of these parameters, such as text preprocessing, weighting...
متن کاملFactors Influencing Effectiveness in Automated Essay Scoring with LSA
This paper addresses the ongoing discussion on influencing factors of automatic essay scoring with latent semantic analysis (LSA). Throughout this paper, we contribute to this discussion by presenting evidence for the effects of the parameters text pre-processing, weighting, singular value dimensionality and type of similarity measure on the scoring results. We benchmark this effectiveness by c...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کامل